Collecting Debug Data in a Secure Chip Implementation

ABSTRACT

Mechanisms, in a processor chip, are provided for obtaining debug data from on-chip logic of the processor chip while the processor chip is in a secure mode of operation. The processor chip is placed into a secure mode of operation in which access to internal logic of the processor chip to control the internal logic of the processor chip, by mechanisms external to the processor chip, is disabled on a debug interface of the processor chip. A triggering condition of the processor chip is detected that is a trigger for initiated debug data collection from the on-chip logic. Debug data collection is performed from the on-chip logic to generate debug data. Data is output, by the processor chip to an external mechanism, on the debug interface based on the debug data.

BACKGROUND

The present application relates generally to an improved data processingapparatus and method and more specifically to mechanisms for collectingdebug data in a secure chip implementation.

Modern processor chips typically include debug interfaces, e.g., JointTest Action Group (JTAG) debug interface, IBM Field Replaceable Unit(FRU) Service Interface (available from International Business Machines(IBM) Corporation of Armonk, N.Y.), I2C Slave, etc., which are usedduring manufacturing, testing and operation to extract debug informationfrom the processor chip in order to ensure that the processor chipfunctions properly. However, once a processor chip is installed in asecure product, i.e. a computing or electronic device, and thus is “inthe field”, these debug interfaces are typically locked so that theprocessor chip operates in a secure mode. This is to eliminate a pathwayby which intruders may obtain access to the processor and control it inan undesirable manner. As a result, debug information cannot be obtainedvia these debug interfaces after the processor chip has been put intoservice due to the secure mode of operation and the disablement of thedebug interfaces.

In order to address this issue, some solutions have been offered but allof them suffer from various drawbacks. For example, IBM RiscWatch,available from IBM Corporation, ARM EJTAG, and Extended Debug Probe(XDP) available from Intel Corporation, all use a JTAG (IEEE 1194.1)interface built into the processor to gain access from an external debugprobe to processor internal registers for extracting debug informationfrom the processor chip. Security is very difficult to implement andverify for such JTAG interfaces. Access protection, i.e. no access orread-only access, has to be determined at chip design time for everyindividual register bit. Logic side-effects or missed functionalityeasily break either security or function of the chip, which results in anew silicon release of the processor chip being required. For example,assume that a particular register needs to be accessed even in securechip operation, i.e. after the secure chip is fabricated and deployed ina product. Instead of keeping the debug-interface fully closed anexception may be made for the particular register. However, thissolution does not allow one to add any other register exceptions lateron due to the fact that the exceptions must be implemented “in silicon.”

Another solution in the x86 processor chip based systems is theNon-Maskable Interrupt (NMI) debugger. The NMI debugger is a piece ofcode in the basic input/output system (BIOS) that is started when afatal error occurs or a physical button on the front of the computingdevice is pressed. The NMI debugger provides a debugger that accessesall registers in-band, i.e. within the processor chip itself having fullcontrol of the processor. The NMI debugger is implemented as part of theoperating system, where when pressing a physical button on the computingdevice, the operating system would jump to a special exception vectorwhere the operating system placed debugging code. With the NMI debugger,there is no hardware access protection and the NMI debugger is dependenton a fully functional main processor, i.e. non-failing, executing code.

SUMMARY

In one illustrative embodiment, a method, in a processor chip, isprovided for obtaining debug data from on-chip logic of the processorchip while the processor chip is in a secure mode of operation. Themethod comprises placing, by the processor chip, the processor chip intoa secure mode of operation in which access to internal logic of theprocessor chip to control the internal logic of the processor chip, bymechanisms external to the processor chip, is disabled on an interfaceof the processor chip. The method further comprises detecting, by theprocessor chip, a triggering condition of the processor chip that is atrigger for initiated debug data collection from the on-chip logic.Moreover, the method comprises performing, by the processor chip, debugdata collection from the on-chip logic to generate debug data. Inaddition, the method comprises outputting, by the processor chip to anexternal mechanism via the interface, data generated based on the debugdata.

In other illustrative embodiments, a processor chip is provided thatcomprises various logic elements for implementing the various operationsof the method described above. For example, the processor chip maycomprise interface logic that provides a communication pathway betweeninternal logic of the processor chip and an external mechanism.Moreover, the processor chip may comprise hardware logic that places theprocessor chip into a secure mode of operation in which access tointernal logic of the processor chip to control the internal logic ofthe processor chip, by the external mechanism to the processor chip, isdisabled on an interface of the processor chip. Furthermore, theprocessor chip may comprise health monitoring logic that detects atriggering condition of the processor chip that is a trigger forinitiated debug data collection from on-chip logic while the processorchip is in the secure mode of operation. In addition, the processor chipmay comprise debug data collection engine that collects debug data fromthe on-chip logic to generate debug data while the processor chip is inthe secure mode of operation, wherein the debug data collection enginegenerates data based on the debug data and the data is output to anexternal mechanism via the interface while the processor chip is in thesecure mode of operation.

In still other illustrative embodiments, a computer program productcomprising a computer usable or readable medium having a computerreadable program is provided. The computer readable program, whenexecuted on a computing device, causes the computing device to performvarious ones of, and combinations of, the operations outlined above withregard to the method illustrative embodiment.

In yet another illustrative embodiment, a system/apparatus is provided.The system/apparatus may comprise a processor chip comprising one ormore processor cores. The processor cores may be coupled to a memory.The processor chip may comprise logic for implementing the variousoperations outlined above with regard to the method. For example, theprocessor chip may implement the logic described above with regard tothe processor chip illustrative embodiment.

These and other features and advantages of the present invention will bedescribed in, or will become apparent to those of ordinary skill in theart in view of, the following detailed description of the exampleembodiments of the present invention.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The invention, as well as a preferred mode of use and further objectivesand advantages thereof, will best be understood by reference to thefollowing detailed description of illustrative embodiments when read inconjunction with the accompanying drawings, wherein:

FIG. 1 is an example block diagram of a data processing system in whichaspects of the illustrative embodiments may be implemented;

FIG. 2 is an example block diagram illustrating the primary operationalelements of an on-chip debug data collection mechanism in accordancewith one illustrative embodiment where collected debug data is pulledfrom the debug data buffer by external devices;

FIG. 3 is an example block diagram illustrating the primary operationalelements of an on-chip debug data collection mechanism in accordancewith another illustrative embodiment where collected debug data ispushed by the debug data collection engine to external devices; and

FIG. 4 is an example diagram illustrating a process for outputting debuginformation in a secure chip environment in accordance with oneillustrative embodiment.

DETAILED DESCRIPTION

The illustrative embodiments provide a mechanism for allowing debuginformation to be collected from a chip after debug interfaces of thechip have been disabled and the chip is placed “in the field,” i.e. whenthe processor chip is operating in a secure mode that does not permitexternal control access to the internal mechanisms of the processorchip. The illustrative embodiments make use of a power-on-reset (POR)engine built into the chip that is typically used at chip initializationto perform special operations for initializing the chip, such asconfiguring scan-rings and setup of processor registers. Following thischip initialization, the POR engine is typically stopped and not usedagain until a next power-on event requiring chip initialization, i.e.after power has been lost to the chip and resumed, e.g., due to a resetoperation or the like.

With the mechanisms of the illustrative embodiments, rather thanstopping the POR engine after chip initialization, the POR enginecontinues to operate and is used to monitor the health of the processorchip by monitoring error status on the chip, e.g., monitoring the statusof a checkstop bit, which is used to indicate a stop of the processor,i.e. freeze the logic state of the processor, so as to avoid operatingon corrupted data. That is, individual processor units (processingcores, memory controller, accelerators, PCIe-cores, elasticinterface/multichip-links, internal-processor busses, and the like) havetheir own checking mechanisms, e.g., error correction code (ECC) orparity error mechanisms. When a unit discovers unrecoverable errors, itwill trigger this checkstop bit by writing a value to this checkstopbit. Furthermore, the checkstop bit will also inform any other unit onthe chip about this unrecoverable error which will cause the whole chipto freeze. This prevents data corruption and initiates gathering debugdata and initiation of recovery operations, e.g., reboot, dynamicreplacement of the failed processing unit, or the like.

If an error is detected to have occurred, the processor cores on theprocessor chip are configured to stop operation immediately, i.e.perform a checkstop operation. If the POR engine detects such acondition, the POR engine executes debug data collection engine logicwhich collects data from the various parts of the processor chip andstores this debug data in a debug data buffer or other storage mechanismthat is accessible in a read-only manner via one or more interfaces ofthe processor chip. Alternatively, the collected debug data may bepushed to the one or more interfaces without requiring storage in anon-chip debug data buffer.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method, or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in any one or more computer readablemedium(s) having computer usable program code embodied thereon. Thecomputer program product may be used to distribute the computer usableprogram code that is used to implement the mechanisms of theillustrative embodiments within a processor chip's hardware mechanisms,for example.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), or any suitablecombination of the foregoing. In the context of this document, acomputer readable storage medium may be any tangible medium that cancontain or store a program for use by or in connection with aninstruction execution system, apparatus, or device. The media on whichthe debug data collection mechanisms are stored maybe part of a chipsecurity envelope and thus, there are mechanisms provided to protect thedebug data collection mechanisms against modification of any kind.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java™, Smalltalk™, C++, or the like, and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. In some illustrative embodiments, theimplementation programming language is a POR-engine assembly or C-code.The program code may be executed entirely in a chip security envelope.

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to the illustrativeembodiments of the invention. It will be understood that each block ofthe flowchart illustrations and/or block diagrams, and combinations ofblocks in the flowchart illustrations and/or block diagrams, can beimplemented by computer program instructions. These computer programinstructions may be provided to a processor of a general purposecomputer, special purpose computer, or other programmable dataprocessing apparatus to produce a machine, such that the instructions,which execute via the processor of the computer or other programmabledata processing apparatus, create means for implementing thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions thatimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus, or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

Thus, the illustrative embodiments may be utilized in many differenttypes of data processing environments. In order to provide a context forthe description of the specific elements and functionality of theillustrative embodiments, FIG. 1 is provided hereafter as an exampleenvironment in which aspects of the illustrative embodiments may beimplemented. It should be appreciated that FIG. 1 is only an example andis not intended to assert or imply any limitation with regard to theenvironments in which aspects or embodiments of the present inventionmay be implemented. Many modifications to the depicted environment maybe made without departing from the spirit and scope of the presentinvention.

With reference now to FIG. 1, a block diagram of an example dataprocessing system is shown in which aspects of the illustrativeembodiments may be implemented. Data processing system 100 is an exampleof a computing system in which computer usable code or instructionsimplementing the processes for illustrative embodiments of the presentinvention may be located or in which hardware logic or a combination ofhardware logic and computer usable code/instructions may be provided forimplementing the various aspects of the illustrative embodimentsdescribed hereafter, or equivalents thereof.

In the depicted example, data processing system 100 may be any known orlater developed data processing system in which the mechanisms of theillustrative embodiments are implemented in one or more hardware devicesof the data processing system. These hardware devices may performvarious functions including operating as the central processing unit(CPU) of the data processing system 100, a communications hardwaredevice, a storage controller or other storage hardware device, acryptographic processor, a network security device, or any otherhardware device from which debug information may be retrieved viainterfaces that are typically disabled after manufacturing and testing,i.e. after deployment of the hardware device in the data processingsystem 100.

In the depicted example, the data processing system 100 employs themechanisms of the illustrative embodiments in one or more processingunit(s) 106 of the data processing system. The one or more processingunit(s) may comprise one or more of a central processing unit (CPU), aservice processor, a co-processor, a cryptographic processor, a storagecontroller, a communications processor, or the like. It should beappreciated that the depicted example is only an example and is notintended to state or imply any limitation as to the types of hardwaredevices in which the mechanisms of the illustrative embodiments may beimplemented. Any hardware device from which debug information may beretrieved, whether that hardware device is in a secure operational stateor a non-secure operational state, may be used to implement themechanisms of the illustrative embodiments without departing from thespirit and scope of the present invention.

As shown in FIG. 1, the processing unit(s) 106 comprise one or moreprocessor cores 110 and 112 (although two are shown in FIG. 1, anynumber of processor cores is possible), one or more communicationsinterfaces 114 and 116 (which in the depicted example are a PCIeinterface 114 and a network interface 116, as examples), a memorycontroller 118, a cryptographic processor core 120, and symmetricmultiprocessor (SMP) links 130. The processor cores 110 and 112 operateto execute instructions and process data to generate outputs, as isgenerally known in the art. The PCIe interface 114 provides acommunications interface through which communication with one or morePCIe devices 102, external to the processing unit(s) 106, is madepossible. PCIe devices may include, for example, Ethernet adapters,add-in cards, and PC cards for notebook computers.

The network interface 116 provides a communication interface throughwhich the processing unit(s) 106 may communicate with other devices viaone or more data networks. The memory controller 118 coordinates andcontrols access to memory 104 which is external to the processingunit(s) 106. The cryptographic processor core 120 operates to performcryptographic functions on data being read or written by the processingunit(s) 106 with regard to the memory 104 as well as communicated overthe communication interfaces 114 and 116, for example. The SMP links 130serve to provide communication links between the processing unit(s) 106and other processing unit(s) within the same data processing system thatmay operate in concert as a symmetric multiprocessor (SMP) dataprocessing system 100. The data processing system 100 may further becoupled to a ROM 108 and other devices (not show) to facilitate furtherfunctionality in the processing unit(s) 106. The operation of theseelements 102-104, 108, and 110-130 is generally known in the art andthus, a more detailed explanation of the functions of each of theseelements is not provided herein.

Those of ordinary skill in the art will appreciate that the hardware inFIG. 1 may vary depending on the implementation. Other internal hardwareor peripheral devices, such as flash memory, equivalent non-volatilememory, or optical disk drives and the like, may be used in addition toor in place of the hardware depicted in FIG. 1. Also, the processes ofthe illustrative embodiments may be applied to a multiprocessor dataprocessing system, other than the SMP system mentioned previously,without departing from the spirit and scope of the present invention.Moreover, the processing unit(s) 106 in FIG. 1 need not have any or allof the elements 110-130 in some embodiments, although generally theprocessing unit(s) 106 will have at least one core and a memorycontroller or equivalent logic for interfacing with a memory.Furthermore, the data processing system 100 need not have PCIe devices102, memory 104, and/or ROM 108, although some memory will typically beincluded.

Moreover, the data processing system 100 may take the form of any of anumber of different data processing systems including client computingdevices, server computing devices, a tablet computer, laptop computer,telephone or other communication device, a personal digital assistant(PDA), router, printer, or any other data processing device or system inwhich debug information may need to be retrieved from a hardware deviceusing disabled interfaces. In some illustrative examples, dataprocessing system 100 may be a portable computing device which isconfigured with flash memory to provide non-volatile memory for storingoperating system files and/or user-generated data, for example.Essentially, data processing system 100 may be any known or laterdeveloped data processing system without architectural limitation.

In accordance with the mechanisms of the illustrative embodiments, theprocessing unit(s) 106 of the data processing system 100 may furthercomprise health monitoring logic 140, such as power-on-reset (POR) logic140, debug data collection engine logic 150, optional debug data bufferlogic 160, and one or more debug interfaces 170. While debug datacollection engine logic 150 is shown as separate from the healthmonitoring logic 140 (or POR logic 140), it should be appreciated thatthese elements may be combined or even partially combined such thatthese elements 140 and 150 logically overlap. In one illustrativeembodiment, the debug data collection engine logic 150 may be consideredan extension of the POR logic 140.

In accordance with the mechanisms of the illustrative embodiments, aftermanufacturing and testing of the processing unit 106, and in preparationfor deployment of the processing unit 106 in the data processing system100 such that it is “in the field” and operational within the dataprocessing system 100, the processing unit 106 is placed in a securemode of operation. In this secure mode of operation, the debuginterface(s) 170 of the processing unit 106 are disabled such thatgeneral access by external mechanisms to the internal logic of theprocessing unit 106 is disabled and external control of the internallogic of the processing unit 106 is not possible. In one illustrativeembodiment, while this general access is disabled, access may beprovided to a debug data buffer 160 only, via these interfaces 170, suchthat debug data may be output to external debug systems and storagemechanisms. In general, only read only access to the debug data buffer160 is made possible through the debug interfaces 170.

The POR logic 140, which as discussed above is generally only used byprior art mechanisms to perform processor chip initialization and thenis shut down and not used while the processor chip is in an operationalmode, is instead maintained operational even after processor chipinitialization. The POR logic 140 monitors the health of the processingunit 106 to detect an error condition of the processor chip whichresults in the processor core(s) of the processor chip to stop operationimmediately, i.e. a checkstop condition. In such a case, the POR logic140 instructs debug data collection logic 150 to collect debug data fromthe logic on the processor chip, including the processor cores, andstore that debug data in a debug data buffer 160 or otherwise directlyoutput the debug data to the debug interface(s) 170 of the processorchip.

In this way, debug data is able to be obtained from the logic on theprocessor chip via the debug interfaces 170 even after deployment of theprocessor chip “in the field” and after the processor chip has beenplaced into a secure mode of operation disabling general access to theinternal logic of the processor chip via the debug interface(s). Thisdebug data may be used in many different ways once it is obtained fromthe processor chip. For example, the debug data may be used to identifyand isolate field replaceable units (FRUs) for potential replacement. Asanother example, this debug data may be used for analyzing the problemencountered by the processor chip so that later improvement of hardwareand/or software of the processor chip may be made.

FIG. 2 is an example block diagram illustrating the primary operationalelements of an on-chip debug data collection mechanism in accordancewith one illustrative embodiment. As shown in FIG. 2, a processor chip200 may be provided that comprises a plurality of processor cores202-206, a memory controller 208, a crypto controller 207 and a PCIeinterface 205 coupled to one another via a processor bus 210. Apervasive bus 209 couples the processor cores 202-206, memory controller208, crypto controller 207 and PCIe interface 205 with debug datacollection engine logic 220 and debug data buffer 230. Other logic onthe processor chip 200, not explicitly shown in FIG. 2, may likewise becoupled to the debug collection engine logic 220 and debug data buffer230 via the pervasive bus 209 or other data connection(s).

The term “pervasive” refers to the use of a pervasive chiplet which is aspecial unit that is tasked with configuring and enabling units of aprocessor chip (referred to as “chiplets”), e.g., processing units,memory controllers, PCIe-cores, accelerator units, etc., when theprocessor chip operation is started. The pervasive chiplet may connectto any other unit on the processor chip 200 and is automatically clockedsuch that it can initialize any other unit or gather debug data usingthe mechanisms of the illustrative embodiments. The connections to theother units of the processor chip 200 are implemented by the pervasivebus 209. It should be noted that even in a severe error situation, suchas a checkstop condition or the like, the pervasive logic, i.e. thestandby region logic connected to the pervasive bus 209, is still aliveand operational. This includes the mechanisms of the illustrativeembodiments, including the debug data collection engine logic 220, debugdata buffer 230, and the like within the standby region pervasive 290.

The debug data collection engine logic 220 controls the collection ofdebug data from the various logic elements of the processor chip 200,e.g., processor cores 202-206, memory controller 208, crypto controller207 and PCIe interface 205, as well as the output of this debug data tothe interfaces 240 of the processor chip 200, which may be dedicateddebug interfaces or standard interfaces used during normal processorruntime but shared with debug logic. The debug data collection enginelogic 220 may operate according to a debug code stored in code memory225. This debug code 225 may further provide instructions for governingan analysis performed by the debug data collection engine logic 220 onthe collected debug data to determine what debug data should be outputon the output interfaces 240.

The debug data collection engine logic 220 may operate in response to acommand from on-chip health monitoring logic 250 that monitors thehealth of the chip 200 for the occurrence of an error condition thatcauses one or more of the processor cores 202-206 and/or the memorycontroller 208, crypto controller 207, PCIe interface 205, or othercritical logic of the processor chip 200, to fail or stop operatingproperly. In one illustrative embodiment, this on-chip health monitoringlogic 250 is a power-on-reset (POR) engine built into the processor chip200 and whose primary purpose is to assist with chip 200 initializationin response to the occurrence of a powering up of the processor chip,such as in the event of a turning on of the data processing system inwhich the processor chip 200 is present, in response to a resetoperation, or the like. Contrary to known mechanisms, instead ofstopping the operation of the POR engine after initialization of theprocessor chip 200, the illustrative embodiments may maintain theoperation of the POR engine but in a health monitoring mode.

The on-chip health monitoring logic 250, e.g., POR engine, isessentially a minimalistic micro-processor which supports specialoperations for chip initialization, e.g., configuring of scan rings andsetup of processor registers, and further supports special operationsfor health monitoring in accordance with the illustrative embodiments.The on-chip health monitoring logic 250, or POR engine, may monitor thecondition of one or more trigger register values in one or more triggerregisters 260 that are indicative of one or more health states of theprocessor chip. In one illustrative embodiment, the one or more triggervalue registers 260 comprises a checkstop value register that is writtento in response to one or more of the critical logic elements of theprocessor chip 200, e.g., processor cores 202-206, memory controller208, crypto controller 207 or PCIe interface 205, encountering an erroror failure that causes a logic element to stop operating or stopoperating correctly. The on-chip health monitoring logic 250 maycontinuously or periodically poll the state of these one or more triggervalue registers 260 to determine if an error state exists.Alternatively, wake-and-go logic may be associated with the one or moretrigger value registers 260 such that when the value of a trigger valueregister 260 is written, the health monitoring logic 250, or POR engine,may be awaken so as to investigate the state of the value written to theone or more trigger value registers.

When the processor chip 200 is manufactured and passes manufacturertesting, the processor chip 200 is placed into a secure mode ofoperation. The processor chip is then put into operation “in the field,”e.g., is installed into a data processing system, such as part of theprocessing unit 106 in data processing system 100 of FIG. 1, forexample, and is made operational, still operating in a secure mode ofoperation. As part of this secure mode of operation, the inbound debuginterfaces 270 are locked and are not able to be used by externalmechanisms to obtain control of the internal logic of the processor chip200 (this is represented by the octagons with the “S” intended torepresent a “stop” sign). To the contrary, with the mechanisms of theillustrative embodiments, when in a secure mode of operation, theinbound debug interfaces 270 can only be used for read-only access ofthe debug data buffer 230.

When the processor chip 200 is powered on, or in response to a resetoperation, the on-chip health monitoring logic 250 may performpower-on-reset (POR) operations to assist in initializing the processorchip 200 to an initial operational state by, for example, configuringscan-rings, processor registers, and the like. After performing this PORinitialization operation, the on-chip health monitoring logic 250 beginsmonitoring the health of the logic of the processor chip 200. In oneillustrative embodiment, the on-chip health monitoring logic 250monitors the state of the logic of the processor chip 200 by monitoringthe state of values in triggering value registers 260. In response to avalue in one or more of these triggering value registers 260 indicatingan error condition having occurred in logic of the processor chip 200,e.g., in one of the processor cores 202-206, memory controller 208,crypto controller 207, PCIe interface 205, or the like.

In response to detecting the error condition, the on-chip healthmonitoring logic 250 may send a command to the debug data collectionengine logic 220 to initiate collection of debug data from the variouslogic elements of the processor chip 200, e.g., from the pervasive logicinterfaces (PLIs) of the processor cores 202-206, memory controller 208,crypto controller 207, PCIe interface 205, and the like, via thepervasive bus 209. The data that is collected may comprise, for example,any logic state information, e.g., register bit state, memory cellstate, etc. inside the chip that may assist in debugging operations. Forexample, it may be helpful to know the current address of a transferwhen an interface fails. While the address bits are not part of an errorregister, the address bit would assist with debug operations and can begathered from other registers. The actual collection of the data may beperformed in various ways depending on the type of data and where it isbeing collected from. For example, data may be collected from faultisolation registers (FIRs) and/or by dumping configuration and statusregister information to a storage location. Further data collection maybe done with regard to scan rings which contain a large number of bitsfrom various places within the processor chip and which may essentiallycontain all current logic states of the whole chip. Moreover, data maybe obtained from memory dumps or the like, e.g., dumping the contents ofthe cache memory or the like. The data that is collected, or at least aportion of the data collected, may be stored in the debug data buffer230.

In one illustrative embodiment, the debug data collection engine logic220 may determine what data to extract from the various logic elementsof the processor chip 200 based on code stored in the code memory 225.Moreover, the code in the code memory 225 may specify analysis to beperformed by the debug data collection engine logic 220 in order todetermine what debug data to store in the debug data buffer 230 and/oroutput on the output debug interfaces 240. This code may further provideinstructions executable by the debug data collection engine logic 220 toanalyze the debug data to perform on-chip internal debugging of theerror condition of the processor chip 200. That is, the code in the codememory 225 may be executed by the debug data collection engine 220 todetermine a source of the error condition and possible solutions to theerror condition so that the operation of the processor chip 200 may bealtered and/or rebooted to avoid the error condition.

The code in the code memory 225 is preferably modifiable during and upto a final stage of the manufacturing and testing stage of the processorchip 200 fabrication. However, once manufacturing and testing isfinalized, the code in the code memory 225 is made read-only and is keptsecure in the code memory 225. In one illustrative embodiment, the codemay be encrypted in the code memory 225 using cryptographic mechanisms,such as signatures, keys, or the like. In some illustrative embodiments,the code memory may be a programmable read-only memory (PROM) or thelike.

The results of the analysis performed by the code in the code memory 225as executed by the debug data collection engine logic 220 may be storedin the debug data buffer 230 and/or output to the output interfaces 240for use by external equipment (not shown). In one illustrativeembodiment, the analysis is done by the debug data collection engine 220prior to storage of data into the debug data buffer 230, i.e. the datathat is stored into the debug data buffer 230 is only the analyzed debugdata which may be a subset or a modification of the raw debug datareceived from the on-chip logic elements, e.g., processor cores 202-206,memory controller 208, crypto controller 207 and PCIe interface 205. Insuch a case, the debug data buffer 230 may be made smaller in size andthe amount of data output on the output debug interfaces 240 may beminimized by outputting the results of the analysis rather than the rawdebug data, e.g., the analysis may serve to filter out unwanted debugdata or otherwise transform a large set of debug data into a smaller setof debug data. In other illustrative embodiments, the debug data buffer230 may store the raw debug data and this raw debug data may be likewiseoutput on the output interfaces 240.

In yet another illustrative embodiment, the debug data buffer 230 maystore the raw debug data and this raw debug data may be analyzed by thecode in the code memory 225 as executed by the debug data collectionengine 220 in response to an operation for outputting the debug datafrom the debug data buffer 230 to the output interfaces 240. In thisway, the debug data buffer 230 may store all of the raw debug data butselect portions or transformations of the raw debug data may be outputon the output interfaces 240 in accordance with the analysis performedby the debug data collection engine 220.

The output of the debug data, either raw debug data or analyzed debugdata that has been either filtered or transformed by the analysisperformed by the debug data collection engine 220, may be output on theoutput interfaces 240 either automatically or in response to a readcommand received via one or more of the input debug interfaces 270. Ingeneral, the input debug interfaces 270 have general access to theinternal logic of the processor chip 200 disabled or blocked due to thesecure mode of operation in which the processor chip 200 is operating.Moreover, in this secure mode of operation, the input debug interfaces270 only allow read commands to be input on the input debug interfaces270 to the debug data buffer 230 and access to other on-chip logic isdisabled. Thus, a read command may be received by the debug data buffer230 via the input debug interfaces 270 but all other logic on the chip200 is not accessible via the input debug interfaces 270 when in secureoperating mode and furthermore, writing to the debug data buffer 230 isnot made possible via these input debug interfaces 270.

In response to a read command received via one or more of the inputdebug interfaces 270, the debug data stored in the debug data buffer 230may be read out and output via the output interfaces 240. In someillustrative embodiments, the debug data that is read out of the debugdata buffer 230 may be the raw debug data collected by the debug datacollection engine logic 220 while in others, it may be thefiltered/transformed debug data generated as a result of analysisperformed by the debug data collection engine logic 220 using the codein code memory 225. Still further, as mentioned above, the data outputmay be the raw debug data from the debug data buffer 230 which isfiltered/transformed by the analysis performed by the debug datacollection engine logic 220 prior to the resulting output debug databeing output on the output debug interfaces 240.

While the above description assumes that the output of the debug data isinitiated in response to an external mechanism (external meaningexternal to the processor chip 200), such as external debuggerhardware/software, an external service processor, or the like,submitting a read command via the input debug interfaces 270, theillustrative embodiments are not limited to such. Rather, the output ofthe debug data may be initiated in response to internal commandsprovided by and within the logic of the processor chip itself. Forexample, the debug data collection engine logic 220 may operate as asecure “post mortem” debugger that may itself debug the processor chip200 and initiate appropriate operations to resolve the error condition,e.g., disable a particular processor core, provide an output indicativeof the source of the error, or the like.

The processor cores 202-206 can be programmed to read the debug databuffer 230 after a restart if they are operable again. They operate as asecure post mortem debugger where a working processor debugs theprevious failure that had lead to a checkstop. The processor can thendecide to use the debug data only internally, to send it through anyfunctional interface, such as PCIe or network directly, or to preprocessit and send the preprocessed data. Thus the debug data buffer 230 allowsfor analyzing and processing of debug data by a functional processorcore 202-206 within the system after its recovery.

Furthermore, the illustrative embodiment shown in FIG. 2 is one exampleof an implementation of the present invention in which a pollingmethodology is utilized. That is, the debug data is not output by thedebug data buffer until requested by receipt of a read command fromeither an internal or an external mechanism. However, the illustrativeembodiments are not limited to polling methodologies. To the contrary,in other illustrative embodiments, a pushing methodology, or acombination of a pushing and polling methodology, may be implementedwithout departing from the spirit and scope of the illustrativeembodiments.

FIG. 3 is an example block diagram illustrating the primary operationalelements of an on-chip debug data collection mechanism in accordancewith another illustrative embodiment. As shown in FIG. 3, thisillustrative embodiment is similar to the illustrative embodimentdepicted in FIG. 2 but with the debug data buffer having been removed.In the pushing methodology, the debug data, either raw orfiltered/transformed debug data, is pushed to the output interfaces 240directly by the debug data collection engine 220 without having to bestored in a debug data buffer to await a read command from aninternal/external mechanism. Thus, in the event of an error conditionbeing detected by the on-chip health monitoring logic 250, the on-chiphealth monitoring logic 250 commands the debug data collection enginelogic 220 to extract and collect debug data from the various on-chiplogic elements, e.g., processor cores 202-206, memory controller 208,crypto controller 207, PCIe interface 205 and the like. The debug datacollection engine logic 220 extracts the data, optionally performsanalysis on the extracted debug data according to code stored in codememory 225 and executed by the debug data collection engine logic 220,and then outputs either the raw debug data (if no analysis is done), orthe filtered/transformed debug data generated as a result of theanalysis, directly to the output interfaces 240 without having to storethis data in a debug data buffer and without having to require a readcommand from the input debug interfaces 270. Thus, essentially, thedebug data collection engine logic 220 pushes the debug data to theoutput interface 240 in response to the detected error condition.

It should be appreciated that while FIGS. 2 and 3 illustrate variouslogic elements of the processor chip 200, these elements are not limitedto being implemented entirely in hardware. To the contrary, some aspectsof the elements may be implemented as software or firmware in theprocessor chip 200. For example, the code memory 225 may store, eitheras software instructions or firmware, the code to be executed by thedebug data collection engine logic 220, which itself may be a processorexecuting debug data collection code, a special purpose hardware orcircuit element, or the like.

Once the debug data is output on the output interfaces 240, the data canbe used by various external hardware and software mechanisms to achievevarious purposes. For example, the external mechanisms may comprisedebugging hardware/software that takes the output debug data anddetermines a source of errors in the chip 200, potential solutions tothe errors, potential improvements to the chip 200 operation and/ordesign, or the like. In one illustrative embodiment, the externalmechanisms may identify field replaceable units (FRUs) of the chip thatmay be replaced to solve the problem leading to the error. Moreover, theexternal mechanism may comprise a simple debug logging mechanism thatlogs the debug data for later use by another system to perform variousoperations.

Thus, the illustrative embodiments provide mechanisms for allowingaccess to on-chip debug data via debug interfaces of the chip while thechip is operating in a secure mode of operation, i.e. external accessvia input debug interfaces to internal chip logic for controlling theoperation of the chip is generally blocked or disabled. The mechanismsof the illustrative embodiments thus, allow debug data to be collectedand output by the chip via output debug interfaces even after the chiphas been placed in the secure mode of operation, i.e. the chip is “inthe field,” with minimal additional on-chip logic required.

FIG. 4 is an example diagram illustrating a process for outputting debuginformation in a secure chip environment in accordance with oneillustrative embodiment. The operation outlined in FIG. 4 may beimplemented, for example, on a processor chip using logic that is usedto initialize the processor chip to an initial operational state, e.g.,POR logic in the depicted example, to perform processor chip healthmonitoring operations and debug data output. Additional logic may alsobe provided, e.g., the debug data collection engine logic and debug databuffer, to help facilitate the collection of debug data and output ofthis debug data through existing output debug interfaces.

As shown in FIG. 4, the operation starts with the POR engine logicperforming an initialization sequence to initialize the chip to anoperational state (step 410). This initialization may comprise settingup scan chains, processor registers, and the like, to an initial stateat which point the POR engine logic would typically go to sleep or stopoperating and control is passed over to the processor chip to operate inan operational mode. However, with the mechanisms of the illustrativeembodiments, rather than having the POR engine logic go into a stoppedor sleep state, the POR engine logic enters a health monitoring mode ofoperation (step 420) in which the POR engine monitors the health of thechip for any error conditions.

In the health monitoring mode of operation, the POR engine initializethe component from which to extract debug information to a firstcomponent, e.g., component 0 (step 430) and then determines if an errorcondition, e.g., a chip crash/malfunction, has occurred (step 440). Sucha determination may be made based on checking one or more statusregisters of the processor chip which are set in response to logic ofthe chip encountering various error conditions. For example, the PORengine logic may check a checkstop register to see if a value in thisregister has been set to a predetermined value indicative of a checkstopcondition occurring in one or more of the processor cores or memorycontroller of the processor chip. If such a condition has occurred, thevalue in the checkstop register may be set to the predetermined valueand the POR engine logic detects this setting as indicative of an errorcondition, e.g., a crash or malfunction of the chip.

If an error condition is detected (step 440), the POR engine logicdetermines whether to extract debug data from the current component,e.g., component 0 initially (step 450). If so, then a procedure toextract the relevant debug data is executed (step 460). Optionally, codedriven analysis of the extracted debug data is performed and results ofthe analysis may be generated (step 470). The results of the analysismay then be dumped to the output debug interface or to an internal debugdata buffer 494, or both depending upon the particular implementation(step 480), i.e. outputs 492. Thereafter, or if debug data is not to beextracted from the current component, the current component isincremented (step 490). A determination is made as to whether thecurrent component is equal to n, i.e. all of the components have beenchecked to determine if debug data should be extracted (step 500). Ifso, then the operation terminates. Otherwise, the operation returns tostep 450 and repeats the operation for the next component.

It should be noted that either the raw extracted debug data or theresults of the analysis, or both, may be stored in an on-chip debug databuffer 494 which is read-only from outside the processor chip. Read onlyinputs 496 may be received from external to the chip and provided to thedebug data buffer 494. It should further be noted that all of theseoperations are performed within a secure envelope 498, e.g., hardwareoperating in a secure mode of operation, indicating that there is nopossibility of manipulation of the chip's internal state. This isaccomplished by only using a read-only interface 496 and/or providingthe debug data via output only debug data interfaces 492 used by thedebug data collection engine logic to push the debug data, or theresults of the analysis, out of the chip to external mechanisms.

It should be noted that while the illustrative embodiments are describedin terms of a secure processor chip environment implementing a secureenvelope, the mechanisms of the illustrative embodiments are not limitedto such. To the contrary, the mechanisms of the illustrative embodimentsmay further be implemented with regard to non-secure processor chips.While security is considerably less in such an implementation, thefiltering and analysis of debug data that may be performed by the debugdata collection engine 220 and the like, may be useful even in the caseof non-secure processor chips.

As noted above, it should be appreciated that the illustrativeembodiments may take the form of an entirely hardware embodiment, anentirely software embodiment or an embodiment containing both hardwareand software elements. In one example embodiment, the mechanisms of theillustrative embodiments are implemented in software or program code,which includes but is not limited to firmware, resident software,microcode, etc.

A data processing system suitable for storing and/or executing programcode will include at least one processor coupled directly or indirectlyto memory elements through a system bus. The memory elements can includelocal memory employed during actual execution of the program code, bulkstorage, and cache memories which provide temporary storage of at leastsome program code in order to reduce the number of times code must beretrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards,displays, pointing devices, etc.) can be coupled to the system eitherdirectly or through intervening I/O controllers. Network adapters mayalso be coupled to the system to enable the data processing system tobecome coupled to other data processing systems or remote printers orstorage devices through intervening private or public networks. Modems,cable modems and Ethernet cards are just a few of the currentlyavailable types of network adapters.

The description of the present invention has been presented for purposesof illustration and description, and is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the art. Theembodiment was chosen and described in order to best explain theprinciples of the invention, the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

1-10. (canceled)
 11. A processor chip, comprising: interface logic thatprovides a communication pathway between internal logic of the processorchip and an external mechanism; hardware logic that places the processorchip into a secure mode of operation in which access to internal logicof the processor chip to control the internal logic of the processorchip, by the external mechanism to the processor chip, is disabled on aninterface of the processor chip; health monitoring logic that detects atriggering condition of the processor chip that is a trigger forinitiated debug data collection from on-chip logic while the processorchip is in the secure mode of operation; and debug data collectionengine that collects debug data from the on-chip logic to generate debugdata while the processor chip is in the secure mode of operation,wherein the debug data collection engine generates data based on thedebug data and the data is output to an external mechanism via theinterface while the processor chip is in the secure mode of operation.12. The processor chip of claim 11, wherein the debug data is collectedfrom on-chip hardware devices via a pervasive bus of the processor chip.13. The processor chip of claim 11, wherein the health monitoring logicmonitors the processor chip to detect an error condition of theprocessor chip which results in one or more processor cores or criticallogic of the processor chip to fail.
 14. The processor chip of claim 11,wherein the health monitoring logic is a power-on reset logic unit ofthe processor chip that operates in a health monitoring mode ofoperation after power-on of the processor chip.
 15. The processor chipof claim 11, wherein the data generated based on the debug data is atransformation of the debug data into data representative of a failurewithin logic of the processor chip based on analysis performed by thedebug data collection engine.
 16. The processor chip of claim 15,wherein the analysis performed by the debug data collection enginecomprises analysis on the debug data to identify a source of the failurewithin the logic of the processor chip, and wherein the data generatedbased on the debug data identifies the source of the failure.
 17. Theprocessor chip of claim 15, wherein the transformation of the debug datainto data representative of the failure within logic of the processorchip filtering out unwanted debug data, or transforming the debug datainto a smaller set of data representative of the failure, prior towriting the data to an on-chip debug data buffer or an output interface.18. The processor chip of claim 11, further comprising a debug databuffer, wherein the debug data collection engine outputs the datagenerated based on the debug data by: storing the debug data in thedebug data buffer; and reading, by the external mechanism, data from thedebug data buffer via the interface, wherein the interface providesexternal access by the external mechanism to only the debug data bufferon the processor chip and does not permit the external mechanism toaccess other internal logic of the processor chip.
 19. The processorchip of claim 11, further comprising a trigger register, wherein thehealth monitoring logic detects the triggering condition of theprocessor chip by polling a value of the trigger register to determineif the trigger register has a value written to the trigger registerindicative of a failure in one of a processor core or critical logic ofthe processor chip.
 20. A computer program product comprising a computerreadable storage medium having a computer readable program storedtherein, wherein the computer readable program, when executed on ahardware of a processor chip, causes the hardware to: place theprocessor chip into a secure mode of operation in which access tointernal logic of the processor chip to control the internal logic ofthe processor chip, by mechanisms external to the processor chip, isdisabled on an interface of the processor chip; detect a triggeringcondition of the processor chip that is a trigger for initiated debugdata collection from the on-chip logic; perform debug data collectionfrom the on-chip logic to generate debug data; and output, to anexternal mechanism via the interface, data generated based on the debugdata.
 21. An apparatus, comprising: a processor chip comprising one ormore processor cores; and a memory coupled to the one or more processorcores of the processor chip, wherein the processor chip comprises:interface logic that provides a communication pathway between internallogic of the processor chip and an external mechanism; hardware logicthat places the processor chip into a secure mode of operation in whichaccess to internal logic of the processor chip to control the internallogic of the processor chip, by the external mechanism to the processorchip, is disabled on an interface of the processor chip; healthmonitoring logic that detects a triggering condition of the processorchip that is a trigger for initiated debug data collection from on-chiplogic while the processor chip is in the secure mode of operation; anddebug data collection engine that collects debug data from the on-chiplogic to generate debug data while the processor chip is in the securemode of operation, wherein the debug data collection engine generatesdata based on the debug data and the data is output to an externalmechanism via the interface while the processor chip is in the securemode of operation.